Efficient Hold-Out for Subset of Regressors
نویسندگان
چکیده
Hold-out and cross-validation are among the most useful methods for model selection and performance assessment of machine learning algorithms. In this paper, we present a computationally efficient algorithm for calculating the hold-out performance for sparse regularized least-squares (RLS) in case the method is already trained with the whole training set. The computational complexity of performing the hold-out is O(|H |3 + |H |2n), where |H | is the size of the hold-out set and n is the number of basis vectors. The algorithm can thus be used to calculate various types of cross-validation estimates effectively. For example, when m is the number of training examples, the complexities of N-fold and leave-one-out cross-validations are O(m/N + (mn)/N) and O(mn), respectively. Further, since sparse RLS can be trained in O(mn) time for several regularization parameter values in parallel, the fast hold-out algorithm enables efficient selection of the optimal parameter value.
منابع مشابه
Vector Autoregressive Model Selection: Gross Domestic Product and Europe Oil Prices Data Modelling
We consider the problem of model selection in vector autoregressive model with Normal innovation. Tests such as Vuong's and Cox's tests are provided for order and model selection, i.e. for selecting the order and a suitable subset of regressors, in vector autoregressive model. We propose a test as a modified log-likelihood ratio test for selecting subsets of regressors. The Europe oil prices, ...
متن کاملMultiresponse Sparse Regression with Application to Multidimensional Scaling
Sparse regression is the problem of selecting a parsimonious subset of all available regressors for an efficient prediction of a target variable. We consider a general setting in which both the target and regressors may be multivariate. The regressors are selected by a forward selection procedure that extends the Least Angle Regression algorithm. Instead of the common practice of estimating eac...
متن کاملIt is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
Multi-task prediction methods are widely used to couple regressors or classification models by sharing information across related tasks. We propose a multi-task Gaussian process approach for modeling both the relatedness between regressors and the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covarian...
متن کاملDeterminant Efficiencies in Ill-Conditioned Models
The canonical correlations between subsets of OLS estimators are identified with design linkage parameters between their regressors. Known collinearity indices are extended to encompass angles between each regressor vector and remaining vectors. One such angle quantifies the collinearity of regressors with the intercept, of concern in the corruption of all estimates due to ill-conditioning. Mat...
متن کاملWell-dispersed subsets of non-dominated solutions for MOMILP problem
This paper uses the weighted L$_1-$norm to propose an algorithm for finding a well-dispersed subset of non-dominated solutions of multiple objective mixed integer linear programming problem. When all variables are integer it finds the whole set of efficient solutions. In each iteration of the proposed method only a mixed integer linear programming problem is solved and its optimal solutions gen...
متن کامل